Back

Protein Science

Wiley

All preprints, ranked by how well they match Protein Science's content profile, based on 221 papers previously published here. The average preprint has a 0.07% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.

1
Impact of changes in buffer ionic concentration and mutations on a GH1 β-Glucosidase homodimer

Chagas, R. S.; Marana, S. R.

2025-02-12 biochemistry 10.1101/2025.02.10.637493 medRxiv
Top 0.1%
34.3%
Show abstract

Oligomerization is a key feature of protein function, with approximately 30% of proteins exhibiting this trait. The homodimeric form of proteins, such as the GH1 {beta}-Glucosidase from Spodoptera frugiperda (Sf{beta}gly), plays a significant role in enzyme activity. In this study, we investigate the homodimerization of Sf{beta}gly, which forms a cyclic C2 dimer with a well-defined interface. Using size exclusion chromatography and SEC-MALS, we characterized the homodimerization behavior of Sf{beta}gly at equilibrium conditions in different ionic concentrations of phosphate buffer. The dissociation constants (KD) increase with decreasing ionic concentration, suggesting that the hydrophobic effect is central to homodimer formation. Site-directed mutagenesis of key residues at the dimer interface further elucidated the contributions of specific amino acid residues to dimer stability. Mutations affecting both, apolar and hydrogen bond-forming residues, significantly increased the KD. However, mutations of hydrogen bond-forming residues caused a smaller KD change than apolar residue mutations, suggesting that while the latter is the driving factor in the dimerization, the former may play a crucial role in guiding the monomers relative orientation. These findings enhance our understanding of protein oligomerization in GH1 {beta}-Glucosidases and its implications for protein design and function.

2
Stability of Alternative Bacteriorhodopsin Folds

Mackin, K. A.; Theobald, D. L.

2023-06-22 biochemistry 10.1101/2023.06.21.545983 medRxiv
Top 0.1%
28.7%
Show abstract

Bacteriorhodopsin is a light-activated proton pump found in archaea and some single-celled eukaryotes (Findlay and Pappin, 1986; Sharma et al., 2006; Spudich et al., 2000). This protein adopts the 7TM fold found used by all type I and type II rhodopsins. In previous work, we used bacteriorhodopsin from Haloterrigena turkmenica to demonstrate that substantially altered protein folds exhibit light-activated proton pumping (Kamo et al., 2006; Mackin et al., 2014). In this work, we further characterized these novel folds by assessing the stability of our mutants. We used SDS denaturation to calculate the change in unfolding free energy relative to the wild-type (Cao et al., 2012). We also determine the extinction coefficient for each mutant. These results demonstrate that even dramatic structural rearrangements do not critically destabilize the protein, although the extinction coefficient does vary independently of the stability and the proton-pumping activity. Interestingly, the position of the A helix in the protein sequence has the largest effect on the stability of the mutant; those mutants where A is not located at a terminus are destabilized compared to the wild-type.

3
Global Analysis of Aggregation Determinants in Small Protein Domains

Martell, C. M.; Gebis, K. K.; Van, H. M.; Gutierrez, Y. M.; Jung, M. D.; Savas, J. N.; Rocklin, G. J.

2025-11-12 biophysics 10.1101/2025.11.11.687847 medRxiv
Top 0.1%
28.3%
Show abstract

Protein aggregation is an obstacle for engineering effective recombinant proteins for biotechnology and therapeutic applications. Predicting protein aggregation propensity remains challenging due to the complex interplay of sequence, structure, environmental factors, and external stress conditions, particularly for globular proteins. To understand the determinants of aggregation and improve its prediction, we quantified insoluble aggregation following high temperature and acidic stress in custom libraries of small protein domains (40-72 amino acids) using a high-throughput, in vitro, mass spectrometry-based method. In total, we quantified aggregation for 18,987 small protein domains, revealing diverse stress-dependent aggregation phenotypes that were consistent in different library contexts. We also found that aggregation measurements on individually purified proteins strongly correlated with high-throughput mixed-pool data (Pearsons r = 0.65-0.79), supporting the use of multiplexed approaches to study aggregation. Using machine learning, we identified sequence and structural features that correlate with aggregation and fine-tuned the protein language model SaProt, which explained 43-55% of the observed variation in a held-out test set of unrelated protein domains. Our model shows promising utility for engineering aggregation-resistant proteins, and our dataset serves as an important resource for developing improved models of protein aggregation.

4
Optimal TELSAM-Target Protein Linker Character is TargetProtein-Dependent

Pedroza Romo, M. J.; Moody, J. D.; Keliiliki, A.; Averett, J.; Gonzalez, J.; Noakes, E.; Wilson, E.; Smith, C.; Averett, B.; Hansen, D.; Nickles, R.; Bradford, M.; Soleimani, S.; Smith, T.; Nawarathnage, S.; Samarawickrama, P.; Kelsch, A.; Bunn, D.; Abiodun, W.; Tsubaki, E.; Doukov, T.; Brown, S.; Stewart, C.

2025-09-02 biochemistry 10.1101/2025.08.29.672704 medRxiv
Top 0.1%
26.6%
Show abstract

Fusing a variant of the sterile alpha motif domain of the human translocation ETS leukaemia protein (TELSAM) to a protein of interest has been shown to significantly enhance crystallization propensity. TELSAM is a pH-dependent, polymer-forming protein crystallization chaperone which, when covalently fused to a protein of interest, forms a stable, well-ordered crystal lattice. However, despite its success, a challenge persists in that crystal quality and diffraction limits appear to be heavily dependent on the choice of linker between TELSAM and the protein of interest, with identification of a functional linker relying on trial-and-error methods. Likewise, previous studies revealed that the 10xHis tag at the TELSAM N-terminus can either facilitate or hinder the ordered crystallization of target proteins attached via flexible or semi-flexible linkers. To address these challenges, we designed multiple constructs with several types of linkers--rigid (helical fusion), semi-flexible (Pro-Alan), and flexible (poly-Gly)--of varying lengths to fuse a designed ankyrin repeat protein (DARPin) to the TELSAM C-terminus. Semi-flexible and flexible linker constructs were made with and without the 10xHis tag. Our findings indicate that short semi-flexible and rigid linkers consistently yield large crystals within 24 hours with a DARPin target protein, but that flexible linkers perform best with a TNK1 UBA domain target protein. Removing the 10xHis tag enhanced crystallization rates, improved crystal morphology, and increased the crystallization propensity of semi-flexible and flexible linker constructs. While removing the His tag did not have a significant effect on crystal size, it improved the diffraction limits and crystal quality of the 1TEL-PA-DARPin construct. These results suggest that the ideal linker selection primarily depends on the properties of the target protein. Our data support the recommendation to use a short yet flexible or semi-flexible linker between TELSAM and the target protein to facilitate protein crystallization and high-resolution structure determination. SynopsisIn this study, we examine the effect of short to medium-length flexible, semi-flexible, and rigid linkers on the crystallization of a DARPin fused to the 1TEL protein crystallization chaperone, demonstrating that while rigid linkers impair crystallization and reduce diffraction quality, the ideal linker character remain target-protein dependent. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=91 SRC="FIGDIR/small/672704v1_ufig1.gif" ALT="Figure 1"> View larger version (22K): org.highwire.dtl.DTLVardef@b90d1org.highwire.dtl.DTLVardef@1508e7dorg.highwire.dtl.DTLVardef@1baf0d1org.highwire.dtl.DTLVardef@1843df_HPS_FORMAT_FIGEXP M_FIG C_FIG

5
Experimentally-Determined Strengths of Atom-Atom (C, N, O) Interactions Responsible for Protein Self-Assembly in Water: Applications to Folding and Other Protein Processes

Cheng, X.; Shkel, I. A.; O'Connor, K.; Record, T.

2020-05-27 biochemistry 10.1101/2020.05.26.104851 medRxiv
Top 0.1%
26.5%
Show abstract

Folding and other protein self-assembly processes are driven by favorable interactions between O, N, and C unified atoms of the polypeptide backbone and sidechains. These processes are perturbed by solutes that interact with these atoms differently than water does. C=O{middle dot}{middle dot}{middle dot}HN hydrogen bonding and various {pi}-system interactions have been better-characterized structurally or by simulations than experimentally in water, and unfavorable interactions are relatively uncharacterized. To address this situation, we previously quantified interactions of alkylureas with amide and aromatic compounds, relative to interactions with water. Analysis yielded strengths of interaction of each alkylurea with unit areas of different hybridization states of unified O, N, C atoms of amide and aromatic compounds. Here, by osmometry, we quantify interactions of ten pairs of amides selected to complete this dataset. A novel analysis yields intrinsic strengths of six favorable and four unfavorable atom-atom interactions, expressed per unit area of each atom and relative to interactions with water. The most favorable interactions are sp2O - sp2C (lone pair-{pi}, presumably n-{pi}*), sp2C - sp2C ({pi}-{pi} and/or hydrophobic), sp2O-sp2N (hydrogen bonding) and sp3C-sp2C (CH-{pi} and/or hydrophobic). Interactions of sp3C with itself (hydrophobic) and with sp2N are modestly favorable, while sp2N interactions with sp2N and with amide/aromatic sp2C are modestly unfavorable. Amide sp2O-sp2O interactions and sp2O-sp3C interactions are more unfavorable, indicating the preference of amide sp2O to interact with water. These intrinsic interaction strengths are used to predict interactions of amides with proteins and chemical effects of amides (including urea, N-ethylpyrrolidone (NEP), and polyvinyl-pyrrolidone (PVP)) on protein stability. SignificanceQuantitative information about strengths of amide nitrogen-amide oxygen hydrogen bonds and {pi}-system and hydrophobic interactions involving amide-context sp2 and/or sp3 carbons is needed to assess their contributions to specificity and stability of protein folds and assemblies in water, as well as to predict or interpret how urea and other amides interact with proteins and affect protein processes. Here we obtain this information from thermodynamic measurements of interactions between small amide molecules in water and a novel analysis that determines intrinsic strengths of atom-atom interactions, relative to water and per unit area of each atom-type present in amide compounds. These findings allow prediction or interpretation of effects of any amide on protein processes from structure, and may be useful to analyze protein interfaces.

6
Decreasing the flexibility of the TELSAM-target protein linker and omitting the cleavable fusion tag improves crystal order and diffraction limits

Gajjar, P.; Pedroza-Romo, M. J.; Litchfield, C.; Callahan, M.; Redd, N.; Nawarathnage, S. D.; Soleimani, S.; Averett, J.; Wilson, E.; Lewis, A.; Stewart, C.; Tseng, Y. J.; Doukov, T.; Lebedev, A.; Moody, J. D.

2023-05-15 biochemistry 10.1101/2023.05.12.540586 medRxiv
Top 0.1%
26.5%
Show abstract

TELSAM crystallization promises to become a revolutionary tool for the facile crystallization of proteins. TELSAM can increase the rate of crystallization and form crystals at low protein concentrations without direct contact between TELSAM polymers and, in some cases, with very minimal crystal contacts overall (Nawarathnage et al., 2022). To further understand and characterize TELSAM-mediated crystallization, we sought to understand the requirements for the composition of the linker between TELSAM and the fused target protein. We evaluated four different linkers Ala-Ala, Ala-Val, Thr-Val, and Thr-Thr, between 1TEL and the human CMG2 vWa domain. We compared the number of successful crystallization conditions, the number of crystals, the average and best diffraction resolution, and the refinement parameters for the above constructs. We also tested the effect of the fusion protein SUMO on crystallization. We discovered that rigidification of the linker improved diffraction resolution, likely by decreasing the number of possible orientations of the vWa domains in the crystal, and that omitting the SUMO domain from the construct also improved the diffraction resolution. SynopsisWe demonstrate that the TELSAM protein crystallization chaperone can enable facile protein crystallization and high-resolution structure determination. We provide evidence to support the use of short but flexible linkers between TELSAM and the protein of interest and to support the avoidance of cleavable purification tags in TELSAM-fusion constructs.

7
Design to Data for mutants of β-glucosidase B from Paenibacillus polymyxa: Y333F, A88E, L219Q, A408H, Y173L, E340S, and Y422F

Maduros, A.; Farinsky, L.; Tagkopoulos, P.; Vater, A.; Siegel, J. B.

2026-02-05 biochemistry 10.64898/2026.02.04.703908 medRxiv
Top 0.1%
25.9%
Show abstract

This study explores computational design predictions related to experimental enzyme behavior by analyzing seven single-point mutants of {beta}-glucosidase B (BglB) from Paenibacillus polymyxa: Y333F, A88E, L219Q, A408H, Y173L, E340S, and Y422F. Each mutation was modeled using Foldit Standalone, and mutant selections were based on predicted thermodynamic stability changes of interest. Six of the seven mutants in this set yielded soluble, expressed protein. Most variants had similar catalytic efficiency compared to the wild type with one exception. The melting temperatures for most variants were also similar to the wild type. Correlation analysis revealed weak but potentially informative relationships between predicted {Delta}TSE and (a) thermal stability and (b) catalytic efficiency. These results further support known limitations of TSE score as a tool for single point mutation design and add to a growing dataset being generated to build the next generation of functionally predictive protein models.

8
A method for predicting evolved fold switchers exclusively from their sequences

Kim, A. K.; Looger, L. L.; Porter, L.

2020-02-20 bioinformatics 10.1101/2020.02.19.956805 medRxiv
Top 0.1%
23.4%
Show abstract

Although most proteins with known structures conform to the longstanding rule-of-thumb that high levels of aligned sequence identity tend to indicate similar folds and functions, an increasing number of exceptions is emerging. In spite of having highly similar sequences, these "evolved fold switchers" (1) can adopt radically different folds with disparate biological functions. Predictive methods for identifying evolved fold switchers are desirable because some of them are associated with disease and/or can perform different functions in cells. Previously, we showed that inconsistencies between predicted and experimentally determined secondary structures can be used to predict fold switching proteins (2). The usefulness of this approach is limited, however, because it requires experimentally determined protein structures, whose magnitude is dwarfed by the number of genomic proteins. Here, we use secondary structure predictions to identify evolved fold switchers from their amino acid sequences alone. To do this, we looked for inconsistencies between the secondary structure predictions of the alternative conformations of evolved fold switchers. We used three different predictors in this study: JPred4, PSIPRED, and SPIDER3. We find that overall inconsistencies are not a significant predictor of evolved fold switchers for any of the three predictors. Inconsistencies between -helix and {beta}-strand predictions made by JPred4, however, can discriminate between the different conformations of evolved fold switchers with statistical significance (p < 1.7*10-13). In light of this observation, we used these inconsistencies as a classifier and found that it could robustly discriminate between evolved fold switchers and evolved non-fold-switchers, as evidenced by a Matthews correlation coefficient of 0.90. These results indicate that inconsistencies between secondary structure predictions can indeed be used to identify evolved fold switchers from their genomic sequences alone. Our findings have implications for genomics, structural biology, and human health.

9
chiLife: An open-source Python package for in silico spin labeling and integrative protein modeling

Tessmer, M. H.; Stoll, S.

2022-12-24 biophysics 10.1101/2022.12.23.521725 medRxiv
Top 0.1%
23.1%
Show abstract

Here we introduce chiLife, a Python package for site-directed spin label (SDSL) modeling for electron paramagnetic resonance (EPR) spectroscopy, in particular double electron-electron resonance (DEER). It is based on in silico attachment of rotamer ensemble representations of spin labels to protein structures. chiLife enables the development of custom protein analysis and modeling pipelines using SDSL EPR experimental data. It allows the user to add custom spin labels, scoring functions and spin label modeling methods. chiLife is designed with integration into third-party software in mind, to take advantage of the diverse and rapidly expanding set of molecular modeling tools available with a Python interface. This article describes the main design principles of chiLife and presents a series of examples. Author summaryThanks to modern modeling methods like AlphaFold2, RosettaFold, and ESMFold, high-resolution structural models of proteins are widely available. While these models offer insight into the structure and function of biomedically and technologically significant proteins, most of them are not experimentally validated. Furthermore, many proteins exhibit functionally important conformational flexibility that is not captured by these models. Site-directed spin labeling (SDSL) electron paramagnetic resonance (EPR) spectroscopy is a powerful tool for probing protein structure and conformational heterogeneity, making it ideal for validating, refining, and expanding protein models. To extract quantitative protein backbone information from experimental SDSL EPR data, accurate modeling methods are needed. For this purpose, we introduce chiLife, an open-source Python package for SDSL modeling designed to be extensible and integrable with other Python-based protein modeling packages. With chiLife, appropriate SDSL EPR experiments for protein model validation can be designed, and protein models can be refined using experimental SDSL EPR data as constraints.

10
Accurate Protein Domain Structure Annotation with DomainMapper

Manriquez-Sandoval, E.; Fried, S. D.

2022-03-20 bioinformatics 10.1101/2022.03.19.484986 medRxiv
Top 0.1%
22.9%
Show abstract

Automated domain annotation plays a number of important roles in structural informatics and typically involves searching query sequences against Hidden Markov Model (HMM) profiles. This process can be ambiguous or inaccurate when proteins contain domains with non-contiguous residue ranges, and especially when insertional domains are hosted within them. Here we present DomainMapper, an algorithm that accurately assigns a unique domain structure annotation to any query sequence, including those with complex topologies. We validate our domain assignments using the AlphaFold database and confirm that non-contiguity is pervasive (6.5% of all domains in yeast and 2.5% in human). Using this resource, we find that certain folds have strong propensities to be non-contiguous or insertional across the Tree of Life, likely underlying evolutionary preferences for domain topology. DomainMapper is freely available and can be run as a single command line function. HIGHLIGHTSDomainMapper generates a unique domain structure annotation, including non-contiguous and insertional domains Automated annotations of non-contiguous domains are validated against the AlphaFold database DomainMapper can be easily installed and used by non-experts Certain folds have strong preferences to be non-contiguous or insertional GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=188 SRC="FIGDIR/small/484986v1_ufig1.gif" ALT="Figure 1"> View larger version (89K): org.highwire.dtl.DTLVardef@1900be8org.highwire.dtl.DTLVardef@1fdae2borg.highwire.dtl.DTLVardef@1b5bd5corg.highwire.dtl.DTLVardef@a31d56_HPS_FORMAT_FIGEXP M_FIG C_FIG

11
Tertiary-interaction characters enable fast, model-based structural phylogenetics beyond the twilight zone

Puente-Lelievre, C.; Malik, A. J.; Douglas, J.; Ascher, D.; Baker, M.; Allison, J.; Poole, A. M.; Lundin, D.; Fullmer, M.; Bouckaert, R.; Steinegger, M.; Matzke, N. J.

2023-12-13 evolutionary biology 10.1101/2023.12.12.571181 medRxiv
Top 0.1%
22.9%
Show abstract

Protein structure is more conserved than protein sequence, and therefore may be useful for phylogenetic inference beyond the "twilight zone" where sequence similarity is highly decayed. Until recently, structural phylogenetics was constrained by the lack of solved structures for most proteins, and the reliance on phylogenetic distance methods which made it difficult to treat inference and uncertainty statistically. AlphaFold has mostly overcome the first problem by making structural predictions readily available. We address the second problem by redeploying a structural alphabet recently developed for Foldseek, a highly-efficient deep homology search program. For each residue in a structure, Foldseek identifies a tertiary interaction closest-neighbor residue in the structure, and classifies it into one of twenty "3Di" states. We test the hypothesis that 3Dis can be used as standard phylogenetic characters using a dataset of 53 structures from the ferritin-like superfamily. We performed 60 IQtree Maximum Likelihood runs to compare structure-free, PDB, and AlphaFold analyses, and default versus custom model sets that include a 3DI-specific rate matrix. Analyses that combine amino acids, 3Di characters, partitioning, and custom models produce the closest match to the structural distances tree of Malik et al. (2020), avoiding the long-branch attraction errors of structure-free analyses. Analyses include standard ultrafast bootstrapping confidence measures, and take minutes instead of weeks to run on desktop computers. These results suggest that structural phylogenetics could soon be routine practice in protein phylogenetics, allowing the re-exploration of many fundamental phylogenetic problems.

12
pOPIN-GG: A resource for modular assembly in protein expression vectors

Bentham, A. R.; Youles, M.; Mendel, M. N.; Varden, F. A.; De la Concepcion, J. C.; Banfield, M. J.

2021-08-10 biochemistry 10.1101/2021.08.10.455798 medRxiv
Top 0.1%
22.8%
Show abstract

The ability to recombinantly produce target proteins is essential to many biochemical, structural, and biophysical assays that allow for interrogation of molecular mechanisms behind protein function. Purification and solubility tags are routinely used to maximise the yield and ease of protein expression and purification from E. coli. A major hurdle in high-throughput protein expression trials is the cloning required to produce multiple constructs with different solubility tags. Here we report a modification of the well-established pOPIN expression vector suite to be compatible with modular cloning via Type IIS restriction enzymes. This allows users to rapidly generate multiple constructs with any desired tag, introducing modularity in the system and delivering compatibility with other modular cloning vector systems, for example streamlining the process of moving between expression hosts. We demonstrate these constructs maintain the expression capability of the original pOPIN vector suite and can also be used to efficiently express and purify protein complexes, making these vectors an excellent resource for high-throughput protein expression trials. HighlightsO_LIpOPIN-GG expression vectors allow for modular cloning enabling rapid screening of purification and solubility tags at no loss of expression compared to previous vectors. C_LIO_LICloning into the pOPIN-GG vectors can be performed from PCR products or from level 0 vectors containing the required parts. C_LIO_LISeveral vectors with different resistances and origins of replication have been generated allowing the effective co-expression and purification of protein complexes. C_LIO_LIAll pOPIN-GG vectors generated here are available on Addgene, as well as level 0 acceptors and tags. C_LI

13
Computational design of a protein family that adopts two well-defined and structurally divergent de novo folds

Wei, K. Y.; Moschidi, D.; Bick, M. J.; Nerli, S.; McShan, A. C.; Carter, L. P.; Huang, P.-S.; Fletcher, D. A.; Sgourakis, N. G.; Boyken, S. E.; Baker, D.

2019-09-04 biochemistry 10.1101/597161 medRxiv
Top 0.1%
22.5%
Show abstract

The plasticity of naturally occurring protein structures, which can change shape considerably in response to changes in environmental conditions, is critical to biological function. While computational methods have been used to de novo design proteins that fold to a single state with a deep free energy minima (Huang et al., 2016), and to reengineer natural proteins to alter their dynamics (Davey et al., 2017) or fold (Alexander et al., 2009), the de novo design of closely related sequences which adopt well-defined, but structurally divergent structures remains an outstanding challenge. Here, we design closely related sequences (over 94% identity) that can adopt two very different homotrimeric helical bundle conformations -- one short ([~]66 [A] height) and the other long ([~]100 [A] height) -- reminiscent of the conformational transition of viral fusion proteins (Ivanovic et al., 2013; Podbilewicz, 2014; Skehel and Wiley, 2000). Crystallographic and NMR spectroscopic characterization show that both the short and long state sequences fold as designed. We sought to design bistable sequences for which both states are accessible, and obtained a single designed protein sequence that populates either the short state or the long state depending on the measurement conditions. The design of sequences which are poised to adopt two very different conformations sets the stage for creating large scale conformational switches between structurally divergent forms.

14
Environmental and mutational modulation of collateral fitness effects informs their mechanisms

Goff, C.; Tsou, E.-Y.; Mehlhoff, J. D.; Ostermeier, M.

2026-01-23 evolutionary biology 10.64898/2026.01.22.699087 medRxiv
Top 0.1%
22.5%
Show abstract

Fitness effects of mutations that do not arise from changes in a proteins ability to perform its physiological functions (called collateral fitness effects or CFEs) are an understudied aspect of fitness landscapes. We have previously systematically measured the CFEs of all possible single amino acid substitutions in four proteins and found the frequency of deleterious mutations to vary by two orders of magnitude. Of these proteins, TEM-1 {beta}-lactamase had the highest frequency, and deleterious mutations caused TEM-1 aggregation. Here, we systematically measured TEM-1 collateral fitness landscapes in environments and situations expected to alter protein aggregation or protein stability. We found a moderate correlation between deleterious CFEs and predicted thermodynamic stability effects in TEM-1s -domain. Empirically, we found that the frequency and magnitude of deleterious CFEs can be reduced by altering the growth environment to disfavor aggregation (i.e. reducing the growth temperature or shifting to minimal media) or by stabilizing TEM-1 (via the M182T mutation or the addition of the {beta}-lactamase inhibitor avibactam to the growth medium). However, although raising the growth temperature to favor aggregation exacerbated deleterious CFEs of many mutations, many mutations effects were reduced. Furthermore, although reductions in CFEs occurred with reductions in TEM-1 aggregation for some mutants, for many mutants they did not. We propose that mutational destabilization exposes protein motifs that can cause deleterious CFEs, but that these motifs and those that cause aggregation are not necessarily the same motifs.

15
Insight into Polyproline II Helical Bundle Stability and Folding from NMR Spectroscopic Characterization of the Snow Flea Antifreeze Protein Denatured State

Trevino, M. A.; Redondo Moya, M.; Lopez Sanchez, R.; Pantoja-Uceda, D.; Mompean, M.; Laurents, D. V.

2022-03-12 biophysics 10.1101/2022.03.10.483783 medRxiv
Top 0.1%
22.4%
Show abstract

The use of PPII helices in protein design is currently hindered by limitations in our understanding of their conformational stability and folding. Recent studies of the snow flea antifreeze protein (sfAFP), a useful model system composed of six PPII helices, suggested that a low denatured state entropy contributes to folding thermodynamics. To get atomic level information on the conformational ensemble and entropy of the reduced denatured state of sfAFP, we have analyzed its chemical shifts and {1H}-15N relaxation parameters by NMR spectroscopy at three experimental conditions. No significant populations of preferred secondary structure were detected. The stiffening of certain N-terminal residues at neutral versus acidic pH leads us to suggest that favorable charge-charge interactions could bias the conformational ensemble to favor the formation of the two disulfide bonds during nascent folding. Despite a high content of flexible glycine residues, the mobility of the sfAFP denatured ensemble is similar for denatured /{beta} proteins both on fast ps/ns as well as slower s/ms timescales. These results are in line with a conformational entropy in the denatured ensemble resembling that of typical proteins and suggest that new structures based on PPII helical bundles should be amenable to protein design.

16
Sifting Through the Noise: A Computational Pipeline for Accurate Prioritization of Protein- Protein Binding Candidates in High-Throughput Protein Libraries

Mondal, A.; Singh, B.; Felkner, R. H.; De Falco, A.; Swapna, G.; Montelione, G. T.; Roth, M.; Perez, A.

2024-01-23 biophysics 10.1101/2024.01.20.576374 medRxiv
Top 0.1%
22.3%
Show abstract

Identifying the interactome for a protein of interest is challenging due to the large number of possible binders. High-throughput experimental approaches narrow down possible binding partners, but often include false positives. Furthermore, they provide no information about what the binding region is (e.g. the binding epitope). We introduce a novel computational pipeline based on an AlphaFold2 (AF) Competition Assay (AF-CBA) to identify proteins that bind a target of interest from a pull-down experiment, along with the binding epitope. Our focus is on proteins that bind the Extraterminal (ET) domain of Bromo and Extraterminal domain (BET) proteins, but we also introduce nine additional systems to show transferability to other peptide-protein systems. We describe a series of limitations to the methodology based on intrinsic deficiencies to AF and AF-CBA, to help users identify scenarios where the approach will be most useful. Given the speed and accuracy of the methodology, we expect it to be generally applicable to facilitate target selection for experimental verification starting from high-throughput protein libraries. Table of Contents O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=173 SRC="FIGDIR/small/576374v1_ufig1.gif" ALT="Figure 1"> View larger version (41K): org.highwire.dtl.DTLVardef@b90071org.highwire.dtl.DTLVardef@1cc2f66org.highwire.dtl.DTLVardef@3c1109org.highwire.dtl.DTLVardef@1827353_HPS_FORMAT_FIGEXP M_FIG C_FIG

17
Consensus Finder web tool to predict stabilizing substitutions in proteins

Jones, B. J.; Kan, C. N. E.; Luo, C.; Kazlauskas, R.

2020-06-30 bioengineering 10.1101/2020.06.29.178418 medRxiv
Top 0.1%
22.2%
Show abstract

The consensus sequence approach to predicting stabilizing substitutions in proteins rests on the notion that conserved amino acids are more likely to contribute to the stability of a protein fold than non-conserved amino acids. To implement a prediction for a target protein sequence, one finds homologous sequences and aligns them in a multiple sequence alignment. The sequence of the most frequently occurring amino acid at each position is the consensus sequence. Replacement of a rarely occurring amino acid in the target with a frequently occurring amino acid is predicted to be stabilizing. Consensus Finder is an open-source web tool that automates this prediction. This chapter reviews the rationale for the consensus sequence approach and explains the options for fine-tuning this approach using Staphylococcus nuclease A as an example.Competing Interest StatementThe authors have declared no competing interest.View Full Text

18
A stickiness scale for disordered proteins

Cao, F.; Tesei, G.; Lindorff-Larsen, K.

2026-01-27 biophysics 10.64898/2026.01.25.701651 medRxiv
Top 0.1%
22.2%
Show abstract

Disordered proteins are a heterogeneous group of proteins that play a broad range of functions in biology, and display conformational properties that range from compact globules to expanded chains. We here describe the results of a data-driven approach to derive a scale that represents the propensity of the twenty amino acids to interact with one another relative to water. The scale is based on biophysical experiments on 115 proteins and can be thought of as a stickiness (or hydropathy) scale specific for disordered proteins. We compare the scale to 70 other previously reported hydropathy scales and find that it is closer to four scales related to membrane proteins or the transition temperatures of elastin-like peptides. We envisage that the new scale will be useful in bioinformatics and machine learning approaches to quantify the role of sequence composition and patterning in disordered proteins, to understand the driving forces for their interactions with other molecules, and their evolutionary conservation.

19
Dissecting the stability determinants of a challenging de novo protein fold using massively parallel design and experimentation

Kim, T.-E.; Tsuboyama, K.; Houliston, S.; Martell, C. M.; Phoumyvong, C. M.; Lemak, A.; Haddox, H. K.; Arrowsmith, C. H.; Rocklin, G. J.

2022-08-03 synthetic biology 10.1101/2021.12.17.472837 medRxiv
Top 0.1%
21.9%
Show abstract

Designing entirely new protein structures remains challenging because we do not fully understand the biophysical determinants of folding stability. Yet some protein folds are easier to design than others. Previous work identified the 43-residue {square}{beta}{beta}{square} fold as especially challenging: the best designs had only a 2% success rate, compared to 39-87% success for other simple folds (1). This suggested the {square}{beta}{beta}{square} fold would be a useful model system for gaining a deeper understanding of folding stability determinants and for testing new protein design methods. Here, we designed over ten thousand new {square}{beta}{beta}{square} proteins and found over three thousand of them to fold into stable structures using a high-throughput protease-based assay. Nuclear magnetic resonance, hydrogen-deuterium exchange, circular dichroism, deep mutational scanning, and scrambled sequence control experiments indicated that our stable designs fold into their designed {square}{beta}{beta}{square} structures with exceptional stability for their small size. Our large dataset enabled us to quantify the influence of universal stability determinants including nonpolar burial, helix capping, and buried unsatisfied polar atoms, as well as stability determinants unique to the {square}{beta}{beta}{square} topology. Our work demonstrates how large-scale design and test cycles can solve challenging design problems while illuminating the biophysical determinants of folding. SignificanceMost computationally designed proteins fail to fold into their designed structures. This low success rate is a major obstacle to expanding the applications of protein design. In previous work, we discovered a small protein fold that was paradoxically challenging to design (only a 2% success rate) even though the fold itself is very simple. Here, we used a recently developed high-throughput approach to comprehensively examine the design rules for this simple fold. By designing over ten thousand proteins and experimentally measuring their folding stability, we discovered the key biophysical properties that determine the stability of these designs. Our results illustrate general lessons for protein design and also demonstrate how high-throughput stability studies can quantify the importance of different biophysical forces.

20
The analytical Flory random coil is a simple-to-use reference model for unfolded and disordered proteins

Alston, J. J.; Ginell, G. T.; Soranno, A. S.; Holehouse, A. S.

2023-03-13 biophysics 10.1101/2023.03.12.531990 medRxiv
Top 0.1%
21.3%
Show abstract

Denatured, unfolded, and intrinsically disordered proteins (collectively referred to here as unfolded proteins) can be described using analytical polymer models. These models capture various polymeric properties and can be fit to simulation results or experimental data. However, the model parameters commonly require users decisions, making them useful for data interpretation but less clearly applicable as stand-alone reference models. Here we use all-atom simulations of polypeptides in conjunction with polymer scaling theory to parameterize an analytical model of unfolded polypeptides that behave as ideal chains ({nu} = 0.50). The model, which we call the analytical Flory Random Coil (AFRC), requires only the amino acid sequence as input and provides direct access to probability distributions of global and local conformational order parameters. The model defines a specific reference state to which experimental and computational results can be compared and normalized. As a proof-of-concept, we use the AFRC to identify sequence-specific intramolecular interactions in simulations of disordered proteins. We also use the AFRC to contextualize a curated set of 145 different radii of gyration obtained from previously published small-angle X-ray scattering experiments of disordered proteins. The AFRC is implemented as a stand-alone software package and is also available via a Google colab notebook. In summary, the AFRC provides a simple-to-use reference polymer model that can guide intuition and aid in interpreting experimental or simulation results.